## [1] "Excluded 3 participants based on catch-trial performance."
We further exclude participants who seem to provide random ratings independent of the scene that they are seeing. We quantify this by computing the mean rating for each utterance across all trials for each participant and computing the correlation between a participant’s actual ratings and their mean rating. A high correlation is unexpected and indicates that a participant chose ratings at random. We therefore also exclude the data from participants for whom this correlation is larger than 0.75.
## `summarise()` has grouped output by 'modal'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'modal', 'percentage_blue'. You can
## override using the `.groups` argument.
## [1] "Excluded 0 participants based on random responses."
## `summarise()` has grouped output by 'workerid', 'percentage_blue', 'modal'. You
## can override using the `.groups` argument.
## `summarise()` has grouped output by 'percentage_blue', 'modal'. You can
## override using the `.groups` argument.
## `summarise()` has grouped output by 'workerid', 'percentage_blue', 'modal'. You
## can override using the `.groups` argument.
## `summarise()` has grouped output by 'workerid', 'percentage_blue', 'modal'. You
## can override using the `.groups` argument.
## `summarise()` has grouped output by 'percentage_blue', 'modal'. You can
## override using the `.groups` argument.
We use the AUC function with the splines
method to directly compute the AUC.
t-test and regression model with control variables:
##
## Two Sample t-test
##
## data: aucs.cautious$auc_diff and aucs.confident$auc_diff
## t = 2.8557, df = 120, p-value = 0.005062
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.36670 18.58957
## sample estimates:
## mean of x mean of y
## 17.239544 6.261407
##
## Call:
## lm(formula = auc_diff ~ cond + test_order + first_speaker_type +
## confident_speaker, data = rbind(aucs.cautious, aucs.confident))
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.687 -13.660 -0.413 11.727 61.374
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.050 3.979 1.018 0.31094
## condconfident (probably-biased) -10.978 3.601 -3.048 0.00284 **
## test_orderreverse 9.655 3.610 2.674 0.00856 **
## first_speaker_typeconfident 10.918 3.606 3.027 0.00303 **
## confident_speakerconfidentm 5.421 3.610 1.502 0.13586
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.89 on 117 degrees of freedom
## Multiple R-squared: 0.1989, Adjusted R-squared: 0.1715
## F-statistic: 7.26 on 4 and 117 DF, p-value: 2.941e-05
library(mclust)
## Package 'mclust' version 5.4.10
## Type 'citation("mclust")' for citing this R package in publications.
##
## Attaching package: 'mclust'
## The following object is masked from 'package:DescTools':
##
## BrierScore
## The following object is masked from 'package:bootstrap':
##
## diabetes
aucs_diff = merge(aucs.cautious, aucs.confident, by=c("workerid"))
aucs_diff$diff_of_diffs = aucs_diff$auc_diff.x - aucs_diff$auc_diff.y
aucs_diff %>% ggplot(aes(x=diff_of_diffs)) + geom_density() + geom_jitter(aes(y=0), width=0, height=0.001) + ggtitle("Raw data + estimated density")
1 Cluster
fit1 = Mclust(aucs_diff$diff_of_diffs, G=1)
print(summary(fit1, parameters=2))
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust X (univariate normal) model with 1 component:
##
## log-likelihood n df BIC ICL
## -281.9502 61 2 -572.1221 -572.1221
##
## Clustering table:
## 1
## 61
##
## Mixing probabilities:
## 1
## 1
##
## Means:
## [1] 10.97814
##
## Variances:
## [1] 605.704
2 Clusters
fit2 = Mclust(aucs_diff$diff_of_diffs, G=2)
print(summary(fit2, parameters=T))
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust E (univariate, equal variance) model with 2 components:
##
## log-likelihood n df BIC ICL
## -275.9833 61 4 -568.4101 -576.1991
##
## Clustering table:
## 1 2
## 51 10
##
## Mixing probabilities:
## 1 2
## 0.8205388 0.1794612
##
## Means:
## 1 2
## 2.048788 51.805221
##
## Variances:
## 1 2
## 241.1448 241.1448
3 Clusters
fit3 = Mclust(aucs_diff$diff_of_diffs, G=3)
print(summary(fit3, parameters=T))
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust E (univariate, equal variance) model with 3 components:
##
## log-likelihood n df BIC ICL
## -276.006 61 6 -576.6773 -631.9808
##
## Clustering table:
## 1 2 3
## 8 43 10
##
## Mixing probabilities:
## 1 2 3
## 0.3230834 0.5026031 0.1743135
##
## Means:
## 1 2 3
## -2.083159 4.999926 52.423891
##
## Variances:
## 1 2 3
## 233.1972 233.1972 233.1972
According to the Bayesian information criterion, a model with two clusters describes the data best.
Fitted model:
aucs_diff %>%
ggplot(aes(x=diff_of_diffs)) +
geom_jitter(aes(y=0, color=first_speaker_type.x), width=0, height=0.001) +
ggtitle("Raw data + Components of gaussian mixture") +
stat_function(fun = dnorm, args = list(mean = fit2$parameters$mean[1], sd = sqrt(fit2$parameters$variance$sigmasq[1]))) +
stat_function(fun = dnorm, args = list(mean = fit2$parameters$mean[2], sd = sqrt(fit2$parameters$variance$sigmasq[2])))
## Warning: Removed 101 row(s) containing missing values (geom_path).
## # A tibble: 244 × 5
## workerid condition most_likely_model name value
## <int> <chr> <chr> <chr> <dbl>
## 1 1436 cautious confident likelihood.cautious -839.
## 2 1436 cautious confident likelihood.confident -800.
## 3 1436 confident cautious likelihood.cautious -945.
## 4 1436 confident cautious likelihood.confident -970.
## 5 1437 cautious cautious likelihood.cautious -413.
## 6 1437 cautious cautious likelihood.confident -537.
## 7 1437 confident cautious likelihood.cautious -307.
## 8 1437 confident cautious likelihood.confident -496.
## 9 1438 cautious cautious likelihood.cautious -589.
## 10 1438 cautious cautious likelihood.confident -641.
## # … with 234 more rows
| workerid | first_speaker_type | test_order | noticed_manipulation | cautious_count | confident_count | aligned_count | first_adaptation_speaker_count |
|---|---|---|---|---|---|---|---|
| 1438 | cautious | parallel | 1 | 1 | 1 | 2 | 1 |
| 1439 | confident | parallel | 0 | 1 | 1 | 2 | 1 |
| 1443 | cautious | reverse | 1 | 1 | 1 | 2 | 1 |
| 1447 | cautious | parallel | 0 | 1 | 1 | 2 | 1 |
| 1453 | confident | parallel | 0 | 1 | 1 | 2 | 1 |
| 1461 | confident | reverse | 0 | 1 | 1 | 2 | 1 |
| 1466 | cautious | reverse | 1 | 1 | 1 | 2 | 1 |
| 1467 | cautious | reverse | 0 | 1 | 1 | 2 | 1 |
| 1472 | cautious | reverse | 0 | 1 | 1 | 2 | 1 |
| 1477 | cautious | parallel | 1 | 1 | 1 | 2 | 1 |
| 1480 | cautious | parallel | 1 | 1 | 1 | 2 | 1 |
| 1487 | confident | parallel | 1 | 1 | 1 | 2 | 1 |
| 1490 | cautious | reverse | 0 | 1 | 1 | 2 | 1 |
| 1491 | cautious | reverse | 1 | 1 | 1 | 2 | 1 |
| 1496 | cautious | reverse | 1 | 1 | 1 | 2 | 1 |
| 1497 | confident | reverse | 1 | 1 | 1 | 2 | 1 |
| 1501 | cautious | reverse | 1 | 1 | 1 | 2 | 1 |
| workerid | first_speaker_type | test_order | noticed_manipulation | cautious_count | confident_count | aligned_count | first_adaptation_speaker_count |
|---|---|---|---|---|---|---|---|
| 1436 | cautious | parallel | 1 | 1 | 1 | 0 | 1 |
| 1440 | confident | parallel | 0 | 1 | 1 | 0 | 1 |
| 1459 | cautious | reverse | 0 | 1 | 1 | 0 | 1 |
| 1470 | confident | reverse | 0 | 1 | 1 | 0 | 1 |
| 1471 | confident | reverse | 0 | 1 | 1 | 0 | 1 |
| 1473 | confident | parallel | 0 | 1 | 1 | 0 | 1 |
| 1474 | confident | parallel | 0 | 1 | 1 | 0 | 1 |
| 1485 | confident | parallel | 0 | 1 | 1 | 0 | 1 |